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Simulating Logical Calculi with Tensors 
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Abstract 

The development of compositional distribu- 
tional models of semantics reconciling the em- 
pirical aspects of distributional semantics with 
the compositional aspects of formal seman- 
tics is a popular topic in the contemporary lit- 
erature. This paper seeks to bring this rec- 
onciliation one step further by showing how 
the mathematical constructs commonly used 
in compositional distributional models, such 
as tensors and matrices, can be used to sim- 
ulate different aspects of predicate logic. 

This paper discusses how the canonical iso- 
morphism between tensors and multilinear 
maps can be exploited to simulate a full-blown 
quantifier-free predicate calculus using ten- 
sors. It provides tensor interpretations of the 
set of logical connectives required to model 
prepositional calculi. It suggests a variant 
of these tensor calculi capable of modelling 
quantifiers, using few non-linear operations. 
It finally discusses the relation between these 
variants, and how this relation should consti- 
tute the subject of future work. 

1 Introduction 

The topic of compositional distributional semantics 
has been growing in popularity over the past few 
years. This emerging sub-field of natural language 
semantic modelling seeks to combine two seemingly 
orthogonal approaches to modelling the meaning of 
words and sentences, namely /orma/ semantics and 
distributional semantics. 

These approaches, summarised in Section |2j dif- 
fer in that formal semantics, on the one hand, pro- 



vides a neatly compositional picture of natural lan- 
guage meaning, reducing sentences to logical rep- 
resentations; one the other hand, distributional se- 
mantics accounts for the ever-present ambiguity and 
polysemy of words of natural language, and pro- 
vides tractable ways of learning and comparing 
word meanings based on corpus data. 

Recent efforts, some of which are briefly re- 
ported below, have been made to unify both of 
these approaches to language modelling to pro- 
duce compositional distributional models of seman- 
tics, leveraging the learning mechanisms of distri- 
butional semantics, and providing syntax-sensitive 
operations for the production of representations of 
sentence meaning obtained through combination of 
corpus-inferred word meanings. These efforts have 
been met with some success in evaluations such 



as phrase similarity tasks (Mitchell and Lapata, 



2008t pitchell and Lapata, 2009} prefenstette and 



[Sadrzadeh, 2 011'; 'K artsaklis et al., 2012| |, sentiment 



prediction (Socher et al., 2012), and paraphrase de- 



tection (Blacoe and Lapata, 2012). 



While these developments are promising with 
regard to the goal of obtaining learnable-yet- 
structured sentence-level representations of lan- 
guage meaning, part of the motivation for unifying 
formal and distributional models of semantics has 
been lost. The compositional aspects of formal se- 
mantics are combined with the corpus-based empir- 
ical aspects of distributional semantics in such mod- 
els, yet the logical aspects are not. But it is these 
logical aspects which are so appealing in formal se- 
mantic models, and therefore it would be desirable 
to replicate the inferential powers of logic within 



compositional distributional models of semantics. 

In this paper, I make steps towards addressing this 
lost connection with logic in compositional distri- 
butional semantics. In Section |2} I provide a brief 
overview of formal and distributional semantic mod- 
els of meaning. In Section [3] I give mathemati- 
cal foundations for the rest of the paper by intro- 
ducing tensors and tensor contraction as a way of 
modelling multilinear functions. In Section [4j I dis- 
cuss how predicates, relations, and logical atoms 
of a quantifier-free predicate calculus can be mod- 
elled with tensors. In Section [5j I present tenso- 
rial representations of logical operations for a com- 
plete propositional calculus. In Section [6j I discuss 
a variant of the predicate calculus from Section |4] 
aimed at modelling quantifiers within such tensor- 
based logics, and the limits of compositional for- 
malisms based only on multilinear maps. I con- 
clude, in Section [7| by suggesting directions for fur- 
ther work based on the contents of this paper. 

This paper does not seek to address the question 
of how to determine how words should be trans- 
lated into predicates and relations in the first place, 
but rather shows how such predicates and relations 
can be modelled using multilinear algebra. As such, 
it can be seen as a general theoretical contribution 
which is independent from the approaches to com- 
positional distributional semantics it can be applied 
to. It is directly compatible with the efforts of ICo 



sentation of the sentence. A simple formal semantic 
model is illustrated in Figure [T] 



ecke et al. (20T0l ) and |Grefenstette et al. (2013) ), dis- 
cussed below, but is also relevant to any other ap- 
proach making use of tensors or matrices to encode 
semantic relations. 

2 Related work 

Formal semantics, from the Montagovian school of 



thought ( [Montague, 1974[|Dowty etal, 1981] ), treats 
natural languages as programming languages which 
compile down to some formal language such as a 
predicate calculus. The syntax of natural languages, 
in the form of a grammar, is augmented by seman- 
tic interpretations, in the form of expressions from 
a higher order logic such as the lambda-beta calcu- 
lus. The parse of a sentence then determines the 
combinations of lambda-expressions, the reduction 
of which yields a well-formed formula of a predi- 
cate calculus, corresponding to the semantic repre- 



Syntactic Analysis 



S ^ NP VP 

NP => cats, milk, etc. 

VP ^ Vt NP 

Vt => like, hug, etc. 



Semantic Interpretation 



IVPmNPJ) 
[cats], Imilk], 
IVtMNPH) 



[likeldcats], Imilk]) 



Icats] Ax. Ilikel (x, [milk] ) 



Ayx.llikeMx,y) Imilk] 

Figure 1: A simple formal semantic model. 

Formal semantic models are incredibly powerful, 
in that the resulting logical representations of sen- 
tences can be fed to automated theorem provers to 
perform textual inference, consistency verification, 
question answering, and a host of other tasks which 



are well developed in the literature (e.g. see ( [Love- 
land, 1978| l and ( [Fittmg, 199 6])). However, the so- 
phistication of such formal semantic models comes 
at a cost: the complex set of rules allowing for 
the logical interpretation of text must either be pro- 
vided a priori, or learned. Learning such represen- 
tations is a complex task, the difficulty of which is 
compounded by issues of ambiguity and polysemy 
which are pervasive in natural languages. 

In contrast, distributional semantic models, best 



summarised by the dictum of Firth (1957 1 that "You 



shall know a word by the company it keeps," pro- 
vide an elegant and tractable way of learning se- 
mantic representations of words from text. Word 
meanings are modelled as high-dimensional vectors 
in large semantic vector spaces, the basis elements 
of which correspond to contextual features such as 
other words from a lexicon. Semantic vectors for 
words are built by counting how many time a target 
word occurs within a context (e.g. within k words 
of select words from the lexicon). These context 
counts are then normalised by a term frequency- 
inverse document frequency-like measure (e.g. TF- 
IDF, pointwise mutual information, ratio of proba- 
bilities), and are set as the basis weights of the vec- 
tor representation of the word's meaning. Word vec- 
tors can then be compared using geometric distance 




► pet 

, snake 

stroke 

Figure 2: A simple distributional semantic model. 



metrics such as cosine similarity, allowing us to de- 
termine the similarity of words, cluster semantically 
related words, and so on. Excellent overviews of dis- 



tributional semantic models are provided by Curran 



(2004[ ) and [Mitchell (20 11| ). A simple distributional 
semantic model showing the spacial representation 
of words 'dog', 'cat' and 'snake' within the context 
of feature words 'pet', 'furry', and 'stroke' is shown 
in Figure [2] 

Distributional semantic models have been suc- 
cessfully applied to tasks such as word-sense 
discrimination ( jSchiitze, 1998| ), thesaurus extrac- 
tion ( Grefenstette, 1994 1, and automated essay 



marking (Landauer and Dumais, 1997 1. However, 
while such models provide tractable ways of learn- 
ing and comparing word meanings, they do not natu- 
rally scale beyond word length. As recently pointed 



out by Tumey (2012 1, treating larger segments of 
texts as lexical units and learning their representa- 
tions distributionally (the 'holistic approach') vio- 
lates the principle of linguistic creativity, according 
to which we can formulate and understand phrases 
which we've never observed before, provided we 
know the meaning of their parts and how they are 
combined. As such, distributional semantics makes 
no effort to account for the compositional nature of 
language like formal semantics does, and ignores is- 
sues relating to syntactic and relational aspects of 
language. 

Several proposals have been put forth over the 
last few years to provide vector composition func- 
tions for distributional models in order to introduce 
compositionality, thereby replicating some of the as- 



pects of formal semantics while preserving learn- 
ability. Simple operations such as vector addition 
and multiplication, with or without scalar or matrix 
weights (to take word order or basic relational as- 



pects into account), have been suggested ( Zanzotto 



et al, 20 10} [Mitchell and Lapata, 2008||Mitcheirand 
Lapata, 2009] l. 



Smolensky (1990 1 suggests using the tensor prod- 



uct of word vectors to produce representations that 



grow with sentence complexity. Clark and Puhnan 



(2006 1 extend this approach by including basis vec- 
tors standing for dependency relations into tensor 
product-based representations. Both of these ten- 
sor product-based approaches run into dimensional- 
ity problems as representations of sentence mean- 
ing for sentences of different lengths or grammati- 
cal structure do not live in the same space, and thus 



cannot directly be compared. Coecke et al. (2010 1 
develop a framework using category theory, solving 
this dimensionality problem of tensor-based models 
by projecting tensored vectors for sentences into a 
unique vector space for sentences, using functions 
dynamically generated by the syntactic structure of 
the sentences. In presenting their framework, which 
partly inspired this paper, they describe how a verb 
can be treated as a logical relation using tensors in 
order to evaluate the truth value of a simple sentence, 
as well as how negation can be modelled using ma- 
trices. 



A related approach, by Baroni and Zamparelli 



(20101, represents unary relations such as adjectives 
as matrices learned by linear regression from cor- 
pus data, and models adjective-noun composition 



as matrix-vector multiplication. Grefenstette et al 



(20131 generalise this approach to relations of any 



arity and relate it to the framework of Coecke et al. 



(2010) using a tensor-based approach to formal se- 
mantic modelling similar to that presented in this pa- 
per. 

Finally, Socher et al. (2012] l apply deep learning 
techniques to model syntax-sensitive vector compo- 
sition using non-linear operations, effectively turn- 
ing parse trees into multi-stage neural networks. 
Socher shows that the non-linear activation func- 
tion used in such a neural network can be tailored to 
replicate the behaviour of basic logical connectives 
such as conjunction and negation. 



3 Tensors and multilinear maps 

Tensors are the mathematical objects deah with in 
multihnear algebra just as vectors and matrices are 
the objects dealt with in linear algebra. In fact, ten- 
sors can be seen as generalisations of vectors and 
matrices by introducing the notion of tensor rank. 
Let the rank of a tensor be the number of indices re- 
quired to describe a vector/matrix-like object in sum 
notation. A vector v in a space V with basis {b^},- can 
be written as the weighted sum of the basis vectors: 



Z^"." 



where the &'. elements are the scalar basis weights 
of the vector. Being fully described with one index, 
vectors are rank 1 tensors. Similarly, a matrix M is 
an element of a space V ®W with basis {(b^, b^)),;- 
(such pairs of basis vectors of V and W are com- 
monly written as {^y ®\i^]ij in multilinear algebra). 
Such matrices are rank 2 tensors, as they can be fully 
described using two indices (one for rows, one for 
columns): 

M=2cffbr®b7 

where the scalar weights c^. are just the /jth ele- 
ments of the matrix. 

A tensor T of rank k is just a geometric object with 
a higher rank. Let T be a member of Vi ® . . . ® Vyt ; we 
can express T as follows, using k indices a\ . . .au'- 



Z 

ai...ak 



C^ b^' 






In this paper, we will be dealing with tensors of rank 
1 (vectors), rank 2 (matrices) and rank 3, which can 
be pictured as cuboids (or a matrix of matrices). 

Tensor contraction is an operation which allows 
us to take two tensors and produce a third. It is a 
generalisation of inner products and matrix multipli- 
cation to tensors of higher ranks. Let T be a tensor in 
Vi®. . .®Vj®Vk and U be a tensor in Vk^Vm®- ■ -^K. 
The contraction of these tensors, written T x U, cor- 
responds to the following calculation: 

TxU = 



Z 






Tensor contraction takes a tensor of rank k and a 
tensor of rank n - k + I and produces a tensor of 



rank n- I, corresponding to the sum of the ranks of 
the input tensors minus 2. The tensors must satisfy 
the following restriction: the left tensor must have 
a rightmost index spanning the same number of di- 
mensions as the leftmost index of the right tensor. 
This is similar to the restriction that amby n matrix 
can only be multiplied with a. phy q matrix iin = p, 
i.e. if the index spanning the columns of the first ma- 
trix covers the same number of columns as the index 
spanning the rows of the second matrix covers rows. 
Similarly to how the columns of one matrix 'merge' 
with the rows of another to produce a third matrix, 
the part of the first tensor spanned by the index k 
merges with the part of the second tensor spanned by 
k by 'summing through' the shared basis elements 
b„* of each tensor. Each tensor therefore loses a 
rank while being joined, explaining how the tensor 
producedby TxUisof rankA:-i-(?i-/:-i-l)-2 = n-I. 
There exists an isomorphism between tensors and 



multilinear maps (Bourbaki, 1989; Lee, 1997 1, such 



that any curried multilinear map 



f-Vi 



Vi 



Vk 



can be represented as a tensor T-^ e Vjt g) V^ ® . . . ® Vi 
(note the reversed order of the vector spaces), with 
tensor contraction acting as function application. 
This isomorphism guarantees that there exists such a 
tensor T^ for every /, such that the following equal- 
ity holds for any \i eVi, . . ., \j e V f 

/Vl . . . Vy = Vi: = T-^ X Vl X . . . X Vj 

4 Tensor-based predicate calculi 

In this section, I discuss how the isomorphism be- 
tween multilinear maps and tensors described above 
can be used to model predicates, relations, and log- 
ical atoms of a predicate calculus. The four aspects 
of a predicate calculus we must replicate here us- 
ing tensors are as follows: truth values, the logical 
domain and its elements (logical atoms), predicates, 
and relations. I will discuss logical connectives in 
the next section. 

Both truth values and domain objects are the ba- 
sic elements of a predicate calculus, and therefore 
it makes sense to model them as vectors rather than 
higher rank tensors, which I will reserve for rela- 
tions. We first must consider the vector space used 



to model the boolean truth values of B. ICoecke etal.l 
(2010') suggest, as boolean vector space, the space B 
with the basis |T, ±), where T = [1 0]^ is inter- 
preted as 'true', and ± = [0 1]^ as 'false'. 

I assign to the domain D, the set of objects in 
our logic, a vector space D on R'®' with basis vec- 
tors {d,),- which are in bijective correspondence with 
elements of D. An element of D is therefore rep- 
resented as a one-hot vector in D, the single non- 
null value of which is the weight for the basis vector 
mapped to that element of D. Similarly, a subset of 
D is a vector of D where those elements of D in the 
subset have 1 as their corresponding basis weights in 
the vector, and those not in the subset have 0. There- 
fore there is a one-to-one correspondence between 
the vectors in D and the elements of the power set 
PiD), provided the basis weights of the vectors are 
restricted to one of or 1 . 

Each unary predicate P in the logic is represented 
in the logical model as a set Mp c D containing the 
elements of the domain for which the predicate is 
true. Predicates can be viewed as a unary function 
fp-.D^B where 



/pW 



T 
± 



if X e Mp 
otherwise 



These predicate functions can be modelled as rank 2 
tensors in B ® D, i.e. matrices. Such a matrix M^ is 
expressed in sum notation as follows: 

M^=[2]cfT«d,.j+[2crx«d,.^ 

The basis weights are defined in terms of the set Mp 
1 if the logical atom jc,- associ- 



as follows: c 



If 



ated with basis weight d, is in Mp, and otherwise; 
conversely, c^ = 1 if the logical atom x,- associated 
with basis weight d, is not in Mp, and otherwise. 

To give a simple example, let's consider a do- 
main with three individuals, represented as the fol- 
lowing one-hot vectors in D: John = [10 0]^, 
Chris = [0 1 0]"^, and torn = [0 1]"^. Let's 
imagine that Chris and John are mathematicians, but 
Tom is not. The predicate P for 'is a mathemati- 
cian' therefore is represented model-theoretically as 
the set Mp = {chris, John). Translating this into a 
matrix gives the following tensor for P: 

1 1 
1 



M^ = 



To compute the truth value of 'John is a mathemati- 
cian', we perform predicate-argument application as 
tensor contraction (matrix-vector multiplication, in 
this case): 



M^ X John = 



1 1 
1 



[ ^ ] 








1 


1 


^ 





[ u J 


L J 



= T 



Likewise for 'Tom is a mathematician': 



M^ X torn = 



1 




1 






1 






1 





1 



= J. 



Model theory for predicate calculus represents 
any ?i-ary relation R, such as a verb, as the set Mr 
of w-tuples of elements from D for which R holds. 
Therefore such relations can be viewed as functions 
fu : £)" ^ B where: 



fR{Xl,...,Xn) 



T if (xi,...,x„) € Mr 
± otherwise 



We can represent the boolean function for such a re- 
lation /? as a tensor T^ mB®D® . . .®D: 





n 




/ 




\ 


>: 't 


..a,T®d„, ®. 


. ® d„„ 


\a\...an 




/ 


/ 




\ 




„,±®d,,®. 


• ® d,,„ 



-I- 



As was the case for predicates, the weights for re- 
lational tensors are defined in terms of the set mod- 
elling the relation: c\^ ^^ is 1 if the tuple {x, . . ., z) 
associated with the basis vectors d^.,, . . . do-, (again, 
note the reverse order) is in Mr and otherwise; and 
^2ff a is 1 if the tuple {x, . . . , z) associated with 
the basis vectors &„„ ■ ■ ■ da, is not in Mr and oth- 
erwise. 

To give an example involving relations, let our 
domain be the individuals John {]) and Mary (m). 
Mary loves John and herself, but John only loves 
himself. The logical model for this scenario is as 
follows: 

D = [j, m) Mioves = {(;, i), (m, m), (m, j)} 

Distributionally speaking, the elements of the do- 
main will be mapped to the following one-hot vec- 
tors in some two-dimensional space D as follows: 



j ^ [1 0]"^ and m = [0 1]"^. The tensor for 'loves' 
can be written as follows, ignoring basis elements 
with null-valued basis weights, and using the dis- 
tributivity of the tensor product over addition: 



nloves 



= T®((di (g)di) + (d2 
-i-(±®d2®di) 



) d2) + (di ® di)) 



Computing "Mary loves John" would correspond to 
the following calculation: 

(T'°^'='xm)xj = 

((T®d2) + (T®di))XJ-T 

whereas "John loves Mary" would correspond to the 
following calculation: 

(T'°™^xj)xm = 

((T (8) di) + (± ® d2)) X m = ± 

5 Logical connectives with tensors 

In this section, I discuss how the boolean connec- 
tives of a propositional calculus can be modelled us- 
ing tensors. Combined with the predicate and rela- 
tion representations discussed above, these form a 
complete quantifier- free predicate calculus based on 
tensors and tensor contraction. 

Negation has already been shown to be modelled 



in the boolean space described earlier by Coecke et 



al. (2010 1 as the swap matrix: 



T^ 



1 

1 



This can easily be verified: 





\ ^ 


1 1 


[ ^ 1 




\ ^ 1 


T" X T = 








— 






1 










1 




\ ^ 


1 1 


[ ^ 1 




\ ^ ] 


T" x± = 








— 






[ 1 


J 


[ ^ \ 




[ J 



= J. 



= T 



All other logical operators are binary, and hence 
modelled as rank 3 tensors. To make talking about 
rank 3 tensors used to model binary operations eas- 
ier, I will use the following block matrix notation for 
2x2x2 rank 3 tensors T: 



T- 



ci d\ 



a2 ^2 

C2 d2 



which allows us to express tensor contractions as 
follows: 



Txv = 



ai bi 


0-2 b2 


a 




ci di 


C2 d2 


[ P \ 




a-a\+l3-a2 a ■ b\ + jS ■ b2 


a ■ ci +p ■ C2 a ■ 


ii+P 


■d2 \ 



or more concretely: 



TxT 



fll 


bi 


ai 


bi 


1 




fli 


bx 1 


C\ 


di 


Cl 


di 







Cl 


dx \ 


ai 


bi 


fl2 


bi 1 







ai 


bi 1 


Cl 


di 


Cl 


di \ 


1 




Cl 


di \ 



Tx± = 



Using this notation, we can define tensors for the 



following operations: 



(V) h^ T"" - 



1 1 


1 





1 


1 





1 


1 1 



(A) h^ T'' - 



(^) ^ T- 



I leave the trivial proof by exhaustion that these fit 
the bill to the reader. 

It is worth noting here that these tensors pre- 
serve normalised probabilities of truth. Let us con- 
sider a model such at that described in ICoecke eti 
al. (20 lO) which, in lieu of boolean truth values. 



represents truth value vectors of the form \a /3]^ 
where a 4- /? = 1 . Applying the above logical op- 
erations to such vectors produces vectors with the 
same normalisation property. This is due to the fact 
that the columns of the component matrices are all 
normalised (i.e. each column sums to 1). To give 
an example with conjunction, let v = \a\ ySi]^ and 
w = [a2P2V with ai + P\ = a2 + P2 - 1- The con- 
junction of these vectors is calculated as follows: 

(T^ X v) X w 



1 





a\ 


ai 


1 


1 1 


V ^1 J 


[ ^2 J 


ai 


a2 




jSi a 


i+A 


[1^2 \ 





a\a2 
^ia2 + {ai +/3i)/32 



To check that the probabihties are normaUsed we 
calculate: 

a]_a2 +ySia2 + («! + P\)P2 
^{ay +j3i)a2 + iai +^1)^2 
-(ai+A)(«2+/32) = l 

We can observe that the resulting probability distri- 
bution for truth is still normalised. The same prop- 
erty can be verified for the other connectives, which 
I leave as an exercise for the reader. 

6 Quantifiers and non-linearity 

The predicate calculus described up until this point 
has repeatedly been qualified as 'quantifier-free', 
for the simple reason that quantification cannot be 
modelled if each application of a predicate or rela- 
tion immediately yields a truth value. In perform- 
ing such reductions, we throw away the informa- 
tion required for quantification, namely the infor- 
mation which indicates which elements of a domain 
the predicate holds true or false for. In this sec- 
tion, I present a variant of the predicate calculus 
developed earlier in this paper which allows us to 
model simple quantification (i.e. excluding embed- 
ded quantifiers) alongside a tensor-based approach 
to predicates. However, I will prove that this ap- 
proach to quantifier modelling relies on non-linear 
functions, rendering them non- suitable for compo- 
sitional distributional models relying solely on mul- 
tilinear maps for composition (or alternatively, ren- 
dering such models unsuitable for the modelling of 
quantifiers by this method). 

We saw, in Section |4) that vectors in the seman- 
tic space D standing for the logical domain could 
model logical atoms as well as sets of atoms. With 
this in mind, instead of modelling a predicate P as 
a truth-function, let us now view it as standing for 
some function fp : ;P(D) -^ PiD), defined as: 

fp(X) = xnMp 

where X is a set of domain objects, and Mp is the set 
modelling the predicate. The tensor form of such a 
function will be some T^^in D ® D. Let this square 
matrix be a diagonal matrix such that basis weights 



tensor maps subsets of D (elements of D) to subsets 
of D containing only those objects of the original 
subset for which P holds (i.e. yielding another vector 
inD). 

To give an example: let us consider a domain with 
two dogs {a and b) and a cat (c). One of the dogs {b) 
is brown, as is the cat. Let S be the set of dogs, and P 
the predicate "brown". I represent these statements 
in the model as follows: 

D = {a, b,c) S = {a, b] Mp - {b, c] 

The set of dogs is represented as a vector S = 
[I I 0]^ and the predicate 'brown' as a tensor in 
D(g)D: 

The set of brown dogs is obtained by computing 
fsiS), which distributionally corresponds to apply- 
ing the tensor T^ to the vector representation of S 
via tensor contraction, as follows: 














T^- 





1 













1 



T^xS = 






1 







1 


1 


- 


1 


1 











-b 



The result of this computation shows that the set of 
brown dogs is the singleton set containing the only 
brown dog, b. As for how logical connectives fit 
into this picture, in both approaches discussed be- 
low, conjunction and disjunction are modelled using 
set-theoretic intersection and union, which are sim- 
ply the component-wise min and max functions over 
vectors, respectively. 

Using this new way of modelling predicates as 
tensors, I turn to the problem of modelling quantifi- 
cation. We begin by putting all predicates in vector 
form by replacing each instance of the bound vari- 
able with a vector 1 filled with ones, which extracts 
the diagonal from the predicate matrix. 

An intuitive way of modelling universal quantifi- 
cation is as follows: expressions of the form "All Xs 
are Fs" are true if and only if Mx = Mx n My, where 
Mx and My are the set of Xs and the set of Ys, re- 
spectively. Using this, we can define the map forall 
for distributional universal quantification modelling 
expressions of the form "All Xs are Fs" as follows: 



c- 



fp 



1 if the atom x corresponding to d, is in Mp 
and otherwise. Through tensor contraction, this 



forall{X,Y) = 



T ifX-m/?i(X,Y) 

J- otherwise 



To give a short example, the sentence 'AH Greeks are 
human' is verified by computing X - (M^'^'^^'^ x 1), 
Y = (M^"™" X 1), and verifying the equality X = 
min(K,Y). 

Existential statements of the form "There exists 
X" can be modelled using the function exists, which 
tests whether or not Mx is empty, and is defined as 
follows: 



exists(K) 



T 



if |X| > 
otherwise 



To give a short example, the sentence 'there exists a 
brown dog' is verified by computing X = (m''"*" x 
1) n (M''°s X 1) and verifying whether or not X is of 
strictly positive length. 

An important point to note here is that neither of 
these quantification functions are multi-linear maps, 
since a multilinear map must be linear in all argu- 
ments. A counter example for forall is to consider 
the case where Mx and My are empty, and multi- 
ply their vector representations by non-zero scalar 
weights a and /?. 

Q'X = X 
ySY-Y 

forall{aX,/3Y) =forall{X,Y) - T 
forall{aX,/3Y) + apT 

I observe that the equations above demonstrate that 
forall is not a multilinear map. 

The proof that exists is not a multilinear map is 
equally trivial. Assume Mx is an empty set and a is 
a non-zero scalar weight: 

aX = X 

exists{aX) = exists{X) = ± 

exists{aX.) + a± 

It follows that exists is not a multi-linear function. 

7 Conclusions and future work 

In this paper, I set out to demonstrate that it was 
possible to replicate most aspects of predicate logic 
using tensor-based models. I showed that tensors 
can be constructed from logical models to represent 
predicates and relations, with vectors encoding ele- 
ments or sets of elements from the logical domain. 



I discussed how tensor contraction allows for evalu- 
ation of logical expressions encoded as tensors, and 
that logical connectives can be defined as tensors to 
form a full quantifier-free predicate calculus. I ex- 
posed some of the limitations of this approach when 
dealing with variables under the scope of quantifiers, 
and proposed a variant for the tensor representation 
of predicates which allows us to deal with quantifi- 
cation. Further work on tensor-based modelling of 
quantifiers should ideally seek to reconcile this work 
with that of |Barwise and Cooper (198"T]). In this sec- 



tion, I discuss how both of these approaches to pred- 
icate modelling can be put into relation, and suggest 
further work that might be done on this topic, and on 
the topic of integrating this work into compositional 
distributional models of semantics. 

The first approach to predicate modelling treats 
predicates as truth functions represented as tensors, 
while the second treats them as functions from sub- 
sets of the domain to subsets of the domain. Yet both 
representations of predicates contain the same infor- 
mation. Let M^ and M'^ be the tensor represen- 
tations of a predicate P under the first and second 
approach, respectively. The relation between these 
representations lies in the equality diag{\)M^) = 
M'^, where p is the covector [ 1 0] (and hence pM^ 
yields the first row of M^). The second row of M^ 
being defined in terms of the first, one can also re- 
cover M^ from the diagonal of M'^. 

Furthermore, both approaches deal with separate 
aspects of predicate logic, namely applying predi- 
cates to logical atoms, and applying them to bound 
variables. With this in mind, it is possible to see how 
both approaches can be used sequentially by noting 
that tensor contraction allows for partial application 
of relations to logical atoms. For example, apply- 
ing a binary relation to its first argument under the 
first tensor-based model yields a predicate. Translat- 
ing this predicate into the second model's form using 
the equality defined above then permits us to use it 
in quantified expressions. Using this, we can eval- 
uate expressions of the form "There exists someone 
who John loves". Future work in this area should 
therefore focus on developing a version of this ten- 
sor calculus which permits seamless transition be- 
tween both tensor formulations of logical predicates. 

Finally, this paper aims to provide a starting point 
for the integration of logical aspects into composi- 



tional distributional semantic models. The work pre- 
sented here serves to illustrate how tensors can sim- 
ulate logical elements and operations, but does not 
address (or seek to address) the fact that the vectors 
and matrices in most compositional distributional 
semantic models do not cleanly represent elements 
of a logical domain. However, such distributional 
representations can arguably be seen as represent- 
ing the properties objects of a logical domain hold 
in a corpus: for example the similar distributions of 
'car' and 'automobile' could serve to indicate that 
these concepts are co-extensive. This suggests two 
directions research based on this paper could take. 
One could use the hypothesis that similar vectors in- 
dicate co-extensive concepts to infer a (probabilis- 
tic) logical domain and set of predicates, and use the 
methods described above without modification; al- 
ternatively one could use the form of the logical op- 
erations and predicate tensors described in this pa- 
per as a basis for a higher-dimensional predicate cal- 
culus, and investigate how such higher-dimensional 
'logical' operations and elements could be defined 
or learned. Either way, the problem of reconciling 
the fuzzy 'messiness' of distributional models with 
the sharp 'cleanliness' of logic is a difficult problem, 
but I hope to have demonstrated in this paper that a 
small step has been made in the right direction. 
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